Reliability Analysis of Fault Tolerant Systems with Multi- Fault Coverage
نویسندگان
چکیده
Fault-tolerance has been an essential architectural attribute for achieving high reliability in many critical applications of digital systems. Automatic fault and error handling mechanisms play a crucial role in implementing fault tolerance because an uncovered (undetected) fault may lead to a system or a subsystem failure even when adequate redundancy exists. Examples of this effect can be found in computing systems, electrical power distribution networks, pipelines carrying dangerous materials etc. Because an uncovered fault may lead to overall system failure, an excessive level of redundancy may even reduce the system reliability. Therefore, an accurate analysis must account for not only the system structure, but also the system fault & error handling behavior (often called coverage behavior) as well. The appropriate coverage modeling approach depends on the type of fault tolerant techniques used. The recent research literature emphasizes the importance of multi-fault coverage models where the effectiveness of recovery mechanisms depends on the coexistence of multiple faults in a group of elements, which are also called fault level coverage (FLC) groups, that collectively participate in detecting and recovering the faults in that group. However, the methods for solving multi-fault coverage models are limited, primarily because of the complex nature of the dependency introduced by the reconfiguration mechanisms. The paper suggests a modification of the generalized reliability block diagram (RBD) method for evaluating reliability indices of systems with multi-fault coverage. The suggested method based on a universal generating function technique computes the reliability indices of complex systems with multi-fault coverage using a straightforward recursive procedure. The proposed algorithm can be easily used in the case of hierarchical structure of FLC groups. Illustrative examples are presented.
منابع مشابه
Reliability Evaluation of Multi-state Systems Subject to Imperfect Coverage using OBDD
This paper presents an efficient approach based on OBDD for the reliability analysis of a multi-state system subject to imperfect fault-coverage with combinatorial performance requirements. Since there exist dependencies between combinatorial performance requirements, we apply the Multi-state Dependency Operation (MDO) of OBDD to deal with these dependencies in a multi-state system. In addition...
متن کاملIncorporating Code Coverage in the Reliability Estimation for Fault-Tolerant Software
We present a technique that uses coverage measures in reliability estimation for fault tolerant programs, particularly N-version software. This technique exploits both coverage and time measures collected during testing phases for the individual program versions and the N-version software system for reliability prediction. The application of this technique on the single-version software was pre...
متن کاملReliability and Performance Evaluation of Fault-aware Routing Methods for Network-on-Chip Architectures (RESEARCH NOTE)
Nowadays, faults and failures are increasing especially in complex systems such as Network-on-Chip (NoC) based Systems-on-a-Chip due to the increasing susceptibility and decreasing feature sizes. On the other hand, fault-tolerant routing algorithms have an evident effect on tolerating permanent faults and improving the reliability of a Network-on-Chip based system. This paper presents reliabili...
متن کاملMathematical modeling and fuzzy availability analysis for serial processes in the crystallization system of a sugar plant
The binary states, i.e., success or failed state assumptions used in conventional reliability are inappropriate for reliability analysis of complex industrial systems due to lack of sufficient probabilistic information. For large complex systems, the uncertainty of each individual parameter enhances the uncertainty of the system reliability. In this paper, the concept of fuzzy reliability...
متن کاملAnalysis of a Multi-Layer Fault-Tolerant COTS Architecture for Deep Space Missions
Fault-tolerant systems are traditionally divided into fault containment regions and custom logic is added to ensure the effects of a fault within a containment region would not propagate to the other regions. This technique may not be applicable in a commercial-off-the-shelf (COTS) based system. While COTS technology is attractive due to its low cost, they are not developed with the same level ...
متن کامل